Utility Theory
How We Made Decision and Behave Dan Gilbert
$$ \begin{aligned} \mathbb{E}\left[ \text{Gain} \right] &= \sum_{gain} P[\text{gain}] \cdot U(\text{gain}) \\ \text{Expected Gain} &= \sum (\text{Odds of Gain}) \times (\text{Value of Gain}) \end{aligned} $$- $P$: Odds of gain. Memory affects our evaluation.
- $U$: Value of gain. Comparison affects our evaluation.
But people did two things:
- Making errors in estimating odds. We underestimate the odds of our future pains.
- Making errors in estimating values. We overestimate the value of our present pleasure.
People usually think: "What space is to size, time is to value." -- Plato
- More is better.
- Now is better than later.
Utility Function / Value Function
$$ U : S \mapsto \mathbb{R} $$- $S$: Real World State, e.g. preference over something.
Optimal Behavior
$$ \max_{a \in A} \sum_{s \in S} P[s|a]U(s) $$- $A$: Possible Actions
- $S$: Possible States
Problem
Utility is not natural. Preferences is!
Theorem: Given a plausible set of assumptions about your preferences, there must exist a consistent utility function.
- $A \succeq B$: Weakly prefers $A$ to $B$.
- $A \succ B$: Strictly prefers $A$ to $B$.
- $A \sim B$: Indifferent between $A$ and $B$.
Reference: Utility Theory
Preference Axioms
Completeness: $\forall A, B: (A \succ B) \vee (A \prec B) \vee (A \sim B)$
Transitivity: $ (A \succ B) \wedge (B \succ C) \Rightarrow (A \succ C) $
Monotonicity: $ A \succ B \wedge p \geq q \Rightarrow [p: A, (1 - p): B] \geq [q: A, (1 - q): B] $
"You should prefer a 90% chance of getting \$1000 to a 50% chance of getting \$10".
Substitutability: $ A \sim B \Rightarrow [p: A, (1 - p), C] \sim [p: B, (1 - p): C] $
"If you prefer apple and banana equally, then a 30% getting of winning an apple looks the same to you as a 30% chance of getting a banana."
Decomposability: $[p: A, (1-p): [q: B, (1-q): C]] \sim [p: A, (1-p)q: B, (1-p)(1-q): C]$.
Continuity: $A \succ B \succ C \Rightarrow \exists p[p: A, (1-p): C] \sim B$.
Suppose a preference relation $\succeq$ satisfies the axioms. Then there exists a utility function $u$ that could represent $\succeq$.
Relation: $$ \begin{aligned} A \succeq B \Leftrightarrow U(A) \geq U(B) \\ A \succ B \Leftrightarrow U(A) > U(B) \\ A \sim B \Leftrightarrow U(A) = U(B) \end{aligned} $$ Lottery: $$ U([p_1: O_1, \dots , p_k:O_k]) = \sum_{i=1}^{k}p_iU(O_i) $$
Value of Information
Expected utility of action $a$ with evidence $E$: $$ \mathbb{E}\left[ U_{E}(A|e) \right] = \max_{a\in A} \sum_{i} P[S_i|e, a] \cdot U(S_i) $$ Expected utility given new evidence $E'$: $$ \mathbb{E}\left[ U_{E, E'}(A|e, e') \right] = \max_{a\in A} \sum_{i} P[S_i|e, e', a] \cdot U(S_i) $$ Value of knowing $E’$(Value of Perfect Information, VPI): $$ \begin{aligned} \text{VPI}_{E}(E') &= \mathbb{E}\left[ U_{E,E'}(A'|e,E')\right] - \mathbb{E}\left[ U_{E}(A|e) \right] \\ &= \left(\sum_{e'} P[e'|e] \cdot \mathbb{E}\left[ U_{E,E'}(A'|e,e')\right] \right) - \mathbb{E}\left[ U_{E}(A|e) \right] \\ &= \text{Expected Utility Given New Information} - \text{Previous Expected Utility} \end{aligned} $$ Utility means: The best expected profit next. It’s irrelevant to action, only relevant to action space.
Example (Time = Cost = -Utility)
Cost of taking highway to work when there isn't traffic: 15
Cost of taking highway to work when there is traffic: 30
Cost of taking local roads to work: 20
Odds of traffic: 0.15
Expected Utility: $$ \mathbb{E}\left[ U(A) \right] = \max_{a \in A} \sum_{t} P[t] \cdot U(a|t) = 0.15\times 30 + 0.85 \times 15 = 17.25 < 20 $$
Expected Utility Given New Info: $$ \mathbb{E}\left[ U_T(A|T) \right] = \max_{a \in A} \sum_{t} P[t|T] \cdot U(a|t, T) $$
- When $T = 1$, local road is chosen. Expected utility is 20.
- When $T = 0$, highway is chosen. Expected utility is 15.
VPI = $\mathbb{E}\left[ U_T(A|T) \right] - \mathbb{E}\left[ U(A) \right] = 17.25 - 15.75 = 1.5$.
In practice, information is partial(what we observed cannot cover the exact state of the world) and imperfect(what we observed might not be reliable).
VPI tells you how much you should pay for one exact piece of information. However this might be myopic. For example only knowing one part of the story may be useless when only knowing all is useful.
Conclusion
Decision theory provides a framework for optimal decision making.
The principle is: maximizing expected utility.